39 research outputs found

    A Unified Operating System for Clouds and Manycore: fos

    Get PDF
    Single chip processors with thousands of cores will be available in the next ten years and clouds of multicore processors afford the operating system designer thousands of cores today. Constructing operating systems for manycore and cloud systems face similar challenges. This work identifies these shared challenges and introduces our solution: a factored operating system (fos) designed to meet the scalability, faultiness, variability of demand, and programming challenges of OSâ s for single-chip thousand-core manycore systems as well as current day cloud computers. Current monolithic operating systems are not well suited for manycores and clouds as they have taken an evolutionary approach to scaling such as adding fine grain locks and redesigning subsystems, however these approaches do not increase scalability quickly enough. fos addresses the OS scalability challenge by using a message passing design and is composed out of a collection of Internet inspired servers. Each operating system service is factored into a set of communicating servers which in aggregate implement a system service. These servers are designed much in the way that distributed Internet services are designed, but provide traditional kernel services instead of Internet services. Also, fos embraces the elasticity of cloud and manycore platforms by adapting resource utilization to match demand. fos facilitates writing applications across the cloud by providing a single system image across both future 1000+ core manycores and current day Infrastructure as a Service cloud computers. In contrast, current cloud environments do not provide a single system image and introduce complexity for the user by requiring different programming models for intra- vs inter-machine communication, and by requiring the use of non-OS standard management tools

    Graphite: A Distributed Parallel Simulator for Multicores

    Get PDF
    This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development for future processors. Several techniques are used to achieve this performance including: direct execution, multi-machine distribution, analytical modeling, and lax synchronization. Graphite is capable of accelerating simulations by leveraging several machines. It can distribute simulation of an off-the-shelf threaded application across a cluster of commodity Linux machines with no modification to the source code. It does this by providing a single, shared address space and consistent single-process image across machines. Graphite is designed to be a simulation framework, allowing different component models to be easily replaced to either model different architectures or tradeoff accuracy for performance. We evaluate Graphite from a number of perspectives and demonstrate that it can simulate target architectures containing over 1000 cores on ten 8-core servers. Performance scales well as more machines are added with near linear speedup in many cases. Simulation slowdown is as low as 41x versus native execution for some applications. The Graphite infrastructure and existing models will be released as open-source software to allow the community to simulate their own architectures and extend and improve the framework

    Fleets: Scalable Services in a Factored Operating System

    Get PDF
    Current monolithic operating systems are designed for uniprocessor systems, and their architecture reflects this. The rise of multicore and cloud computing is drastically changing the tradeoffs in operating system design. The culture of scarce computational resources is being replaced with one of abundant cores, where spatial layout of processes supplants time multiplexing as the primary scheduling concern. Efforts to parallelize monolithic kernels have been difficult and only marginally successful, and new approaches are needed. This paper presents fleets, a novel way of constructing scalable OS services. With fleets, traditional OS services are factored out of the kernel and moved into user space, where they are further parallelized into a distributed set of concurrent, message-passing servers. We evaluate fleets within fos, a new factored operating system designed from the ground up with scalability as the first-order design constraint. This paper details the main design principles of fleets, and how the system architecture of fos enables their construction. We describe the design and implementation of three critical fleets (network stack, page allocation, and file system) and compare with Linux. These comparisons show that fos achieves superior performance and has better scalability than Linux for large multicores; at 32 cores, fos's page allocator performs 4.5 times better than Linux, and fos's network stack performs 2.5 times better. Additionally, we demonstrate how fleets can adapt to changing resource demand, and the importance of spatial scheduling for good performance in multicores

    PIKA: A Network Service for Multikernel Operating Systems

    Get PDF
    PIKA is a network stack designed for multikernel operating systems that target potential future architectures lacking cache-coherent shared memory but supporting message passing. PIKA splits the network stack into several servers that communicate using a low-overhead message passing layer. A key challenge faced by PIKA is the maintenance of shared state, such as a single accept queue and load balance information. PIKA addresses this challenge using a speculative 3-way handshake for connection acceptance, and a new distributed load balancing scheme for spreading connections. A PIKA prototype achieves competitive performance, excellent scalability, and low service times under load imbalance on commodity hardware. Finally, we demonstrate that splitting network stack processing by function across separate cores is a net loss on commodity hardware, and we describe conditions under which it may be advantageous

    Providing a Shared File System in the Hare POSIX Multikernel

    No full text
    Thesis: Ph. D. in Computer Science, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.31Cataloged from PDF version of thesis.Includes bibliographical references (pages 71-73).Hare is a new multikernel operating system that provides a single system image for multicore processors without cache coherence. Hare allows applications on different cores to share files, directories, file descriptors, sockets, and processes. The main challenge in designing Hare is to support shared abstractions faithfully enough to run applications that run on traditional shared-memory operating systems with few modifications, and to do so while scaling with an increasing number of cores. To achieve this goal, Hare must support shared abstractions (e.g., file descriptors shared between processes) that appear consistent to processes running on any core, but without relying on hardware cache coherence between cores. Moreover, Hare must implement these abstractions in a way that scales (e.g., sharded directories across servers to allow concurrent operations in that directory). Hare achieves this goal through a combination of new protocols (e.g., a 3-phase commit protocol to implement directory operations correctly and scalably) and leveraging properties of non-cache coherent multiprocessors (e.g., atomic low-latency message delivery and shared DRAM). An evaluation on a 40-core machine demonstrates that Hare can run many challenging Linux applications (including a mail server and a Linux kernel build) with minimal or no modifications. The results also show these applications achieve good scalability on Hare, and that Hare's techniques are important to achieving scalability.by Charles Gruenwald, III.Ph. D. in Computer Scienc

    Graphite: A Distributed Parallel Simulator for Multicores

    No full text
    Abstract-This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development. Several techniques are used to achieve this including: direct execution, seamless multicore and multi-machine distribution, and lax synchronization. Graphite is capable of accelerating simulations by distributing them across multiple commodity Linux machines. When using multiple machines, it provides the illusion of a single process with a single, shared address space, allowing it to run off-theshelf pthread applications with no source code modification. Our results demonstrate that Graphite can simulate target architectures containing over 1000 cores on ten 8-core servers. Performance scales well as more machines are added with near linear speedup in many cases. Simulation slowdown is as low as 41× versus native execution

    Light-flavor particle production in high-multiplicity pp collisions at s\sqrt{s} = 13 TeV as a function of transverse spherocity

    No full text
    Results on the transverse spherocity dependence of light-flavor particle production (π\pi, K, p, ϕ\phi, K0{\rm K^{*0}}, KS0{\rm K}^{0}_{\rm{S}}, Λ\Lambda, Ξ\Xi) at midrapidity in high-multiplicity pp collisions at s\sqrt{s} = 13 TeV were obtained with the ALICE apparatus. The transverse spherocity estimator (SOpT=1S_{\text{O}}^{p_{\rm T}=1}) categorizes events by their azimuthal topology. Utilizing narrow selections on SOpT=1S_{\text{O}}^{p_{\rm T}=1}, it is possible to contrast particle production in collisions dominated by many soft initial interactions with that observed in collisions dominated by one or more hard scatterings. Results are reported for two multiplicity estimators covering different pseudorapidity regions. The SOpT=1S_{\text{O}}^{p_{\rm T}=1} estimator is found to effectively constrain the hardness of the events when the midrapidity (η<0.8\left | \eta \right |< 0.8) estimator is used. The production rates of strange particles are found to be slightly higher for soft isotropic topologies, and severely suppressed in hard jet-like topologies. These effects are more pronounced for hadrons with larger mass and strangeness content, and observed when the topological selection is done within a narrow multiplicity interval. This demonstrates that an important aspect of the universal scaling of strangeness enhancement with final-state multiplicity is that high-multiplicity collisions are dominated by soft, isotropic processes. On the contrary, strangeness production in events with jet-like processes is significantly reduced. The results presented in this article are compared with several QCD-inspired Monte Carlo event generators. Models that incorporate a two-component phenomenology, either through mechanisms accounting for string density, or thermal production, are able to describe the observed strangeness enhancement as a function of SOpT=1S_{\text{O}}^{p_{\rm T}=1}.Results on the transverse spherocity dependence of light-flavor particle production (π\pi, K, p, ϕ\phi, K0{\rm K^{*0}}, KS0{\rm K}^{0}_{\rm{S}}, Λ\Lambda, Ξ\Xi) at midrapidity in high-multiplicity pp collisions at s=13\sqrt{s} = 13 TeV were obtained with the ALICE apparatus. The transverse spherocity estimator (SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1}) categorizes events by their azimuthal topology. Utilizing narrow selections on SOpT=1S_{\text{O}}^{{\it p}_{\rm T}=1}, it is possible to contrast particle production in collisions dominated by many soft initial interactions with that observed in collisions dominated by one or more hard scatterings. Results are reported for two multiplicity estimators covering different pseudorapidity regions. The SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1} estimator is found to effectively constrain the hardness of the events when the midrapidity (η<0.8\left | \eta \right |< 0.8) estimator is used. The production rates of strange particles are found to be slightly higher for soft isotropic topologies, and severely suppressed in hard jet-like topologies. These effects are more pronounced for hadrons with larger mass and strangeness content, and observed when the topological selection is done within a narrow multiplicity interval. This demonstrates that an important aspect of the universal scaling of strangeness enhancement with final-state multiplicity is that high-multiplicity collisions are dominated by soft, isotropic processes. On the contrary, strangeness production in events with jet-like processes is significantly reduced. The results presented in this article are compared with several QCD-inspired Monte Carlo event generators. Models that incorporate a two-component phenomenology, either through mechanisms accounting for string density, or thermal production, are able to describe the observed strangeness enhancement as a function of SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1}
    corecore